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Abstract 

We consider a model selection estimator of the covariance of a random process. Using 
the Unbiased Risk Estimation (URE) method, we build an estimator of the risk which allows 
to select an estimator in a collection of model. Then, we present an oracle inequality which 
ensures that the risk of the selected estimator is close to the risk of the oracle. Simulations 
' show the efficiency of this methodology. 
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(3 ; 1 Introduction 
(N 

Estimating the covariance function of stochastic processes is a fundamental issue in statistics 
^ I with many applications, ranging from geostatistics, financial series or epidemiology for instance 

C/^ ' (we refer to |Ste99j . |Jou77) or |Cre93j for general references). While parametric methods have 

r~| . been extensively studied in the statistical literature (see |Cre93] for a review), nonparamet- 

"j^ I ric procedures have only recently received attention, see for instance |EPTA08l IBBLMAlOl 

IBBLLMAlOt IBBLAll] and references therein. One of the main difficulty in this framework is 
to impose that the estimator is also a covariance function, preventing the direct use of usual 
nonparametric statistical methods. 
J> ' In this paper, we propose to construct a non parametric estimator of the covariance func- 

. tion of a stochastic process by using a model selection procedure based on the Unbiased Risk 

I Estimation (U.R.E.) method. We work under general assumptions on the process, that is, we 

Tlj" ' do not assume Gaussianity nor stationarity of the observations. 

^ Consider a stochastic process {X {t))^^rp taking its values in M and indexed by T C M'', 

d E N. We assume that E [X [t)] = \/t £ T and we aim at estimating its covariance function 
a (s, t) = E.[X (s) X (t)] < oo for all t,s £ T. We assume we observe Xi (tj) where i £ {1 . . .n} 
and t G {1 . . . n}. Note that the observation points tj are fixed and that the Xj's are independent 
^ , copies of the process X. Set set Xi = (Xi (ti) , . . . ,Xi (tp)) Mi E {1 . . . n} and denote by S the 

?H ' covariance matrix of these vectors. 

Following the methodology presented in [BBLMAlOj . we approximate the process X by its 
projection onto some finite dimensional model. For this, consider a countable set of functions 
i9>^)\ek which may be for instance a basis of (T) and choose a collection of models M. <ZV (A). 
For m <Z M, & finite number of indices, the process can be approximated by 

x{t)^Y. • 

Asm 
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Such an approximation leads to an estimator of S depending on the collection of functions m, 
denoted by Sm- Our objective is to select in a data driven way, the best model, i.e the one close 
to an oracle mo defined as a minimizer of the quadratic risk, namely 

mo G arg mini? (m) = arg minE 

A model selection procedure will be performed using the U.R.E. method, which has been intro- 
duced in |Ste81j and fully described in |Tsy04| . The idea is to find an estimator R (m) of the 
risk which is unbiased, and to select m by minimizing this estimator. Hence, if R is close to its 
expectation, will be an estimator with a small risk, nearly as the best quantity Smo- 

In this work, following the U.R.E. method, we build an estimator of the risk which allows 
to select an estimator of the covariance function. Then, we present an oracle inequality for the 
covariance estimator which ensures that the risk of the selected estimator is not too large with 
respect to the risk of the oracle. 

The paper is organized as follows. In Section [2] we present the statistical framework and 
recall some useful algebraic tools for matrices. The following section. Section [3] is devoted to 
the approximation of the process and the construction of the covariance estimator. Section H] is 
devoted to the U.R.E. method, and provides an oracle inequality. Some numerical experiments 
are exposed in Section [5l while the proofs are postponed to the Appendix. 



2 The statistical framework 

Recall that we consider an M-valued stochastic process, X = {X {t))f^rp, where T is some subset 
of M*^, d G N. We assume that X has finite moments up to order 4 and zero mean. Our aim is 
to study the covariance function of X denoted by a {s, t) = E [X (s) X (t)]. 

Let Xi,...Xn be independent copies of the process X, and assume that we observe these 
copies at some determinist points ti, ...,tp in T. We set xt = (Xi (ti) , . . . ,Xi (tp)) T, and denote 
the empirical covariance of the data by 

1 " J 
n ^-^ 

i=l 

with expectation S = (ct {tj,tk))i^j^k^p- 

Hence, the observation model can be written, in a matrix regression framework, as 

Xix] =^ + Ui G , 1 ^ i ^ n (1) 

Where Ui are i.i.d. error matrices with E [C/j] = 0. 

We now recall some notations related to the study of matrices, which will be used in the 
following. 

Denote by St the subset composed of symmetric matrix in M*^*. 

For any matrix A = {aij) G M'^^*, \\A\^ = tr (^AAJ^ is the Frobenius norm of the matrix 
which is associated to the inner scalar product {A, B) = tr i^AB~^^. 

A' G M*^'^ is a reflexive generalized inverse of A^ that is, some matrix such as A^AA^ = A 
andAA'A = A~ . 

In the following, we will consider matrix data as a natural extension of the vectorial data, 
with different correlation structure. For this, we introduce a natural linear transformation, 
which converts any matrix into a column vector. The vectorization of a A; x n matrix A = 
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{0'ij)i<i<k,i<j<n is the knxl column vector denoted by vec (A), obtained by stacking the columns 
of the matrix A on top of one another. That is uec(A) = [an, a^i, ai2, afc2) •••) Q^in, •••7 Ofcn]"''- 
If A ={aij)i<i<k,i<j<n is a k X n matrix and B =(bij)i<i<p^i<j<q is a p x q matrix, then the 
Kronecker product of the two matrices, denoted hy Af^ B, is the kp x nq block matrix 



A(g)B 



auB 



ttkiB 



dlnB 



O-knB 



For A, B and C some real matrices, we recall the following properties that will be useful in 
our settings. 



Proposition 2.1. 



{A(^B' 

These identities can be found in [Seb08| . 



vec (ABC) = [C' ^Aj vec {B) 
\\A\\ = ||?;ec(^)|| = \\vec{A)\\^^ 
(A Ci)B){C(^D) = (AC) (BD) 
= A'^(S)B^ 



(2) 

(3) 
(4) 
(5) 



Let m £ Ai, and recall that to the finite set {gx}x£m functions gx : T ^ R we associate 
the n X \m\ matrix G with entries gjx = g\ {tj), j = 1, n, X £ m. Furthermore, for each t G T, 
we write Gt = {g\ (t) , A G m)^. For A; G N, denotes the linear subspace of M'^^'^ composed 
of symmetric matrices. For G gM"^!*"!, S (G) is the linear subspace of M"^"" defined by 

5(G) = {g*GT:*g5™}. 

This set will be the natural projection space for the corresponding covariance estimator. 



3 Model selection approach 

The estimation procedure is a two step procedure. First we consider a functional expansion of 
the process and approximate it by its projection onto some finite collection of functions. Then, 
we construct a rule to pick out the best of these estimators among the collection of estimated, 
based on the U.R.E. method. 

In this section, we explain the construction of a projection based estimator for the covariance 
of a process and point out its properties. More details can be found in jBBLMAlOj . 

Consider a process X with an expansion on a set of functions (5a)agA following form 

AeA 

where A is a countable set, and {ax)^^^^ are random coefficients in M of the process X. 

This situation occurs in large number of cases. If we assume that the process takes its values 
in (T) or an Hilbert space, a natural choice of the functions is given by the corresponding 
Hilbert basis (5A)AeA (^)- Alternatively, the Karhunen-Loeve expansion of the covariance 
provides a natural basis. However, since it relies on the nature of the process X, this expansion 
is usually unknown or require additional information on the process. We refer to |Adl90] for 



3 



more references on this expansion. Under other kind of regularity assumptions on the process, 
for instance assuming that the paths of the process belong to some RKHS, other expansions can 
be considered as in (CYIO) for instance. 

Now consider the projection of the process onto a finite number of functions. For this, let 
m be a finite subset of A and consider the corresponding approximation of the process in the 
following form 

X{t)=Y,^x9x{t) (6) 

We note Gm G RP^I™-! where {Gm)j\ = dxitj) and the random vector of RI™-! with 
coefficients (ax))^^^. 

Hence, we obtain that 

X = (^X (h) , ..,X (tp)^ =Gmam 

and 

XX — G'uiCLfnO'rn^m' 

Thus, approximating the process X hy X its projection onto the model m implies approx- 
imating the covariance matrix S by Gm^G^ ^ G rI^IxI™! where ^' = E [0^0^] is some 
symmetric matrix. With previous definitions, that amounts to saying that we want to choose 
an estimator in the subset S (Gm) for some subset m of A. 

Assume that the subset m is fixed. The best approximation of S in 5 (Gm) for the Frobenius 
norm is its projection denoted by S.^. But S is unknown, hence we can not determinate this 
quantity. A natural idea is to study the projection of on 5 (Gm)- We denote this quantity by 

Proposition 3.1 in [BBLMAIO] gives an explicit form for these projections. We recall it for 
sake of completeness. 

Proposition 3.1. Let A in RP^p and G G RP^I'"!. The infimum 

inf{p-r||;rGcS(G)} 

is achieved at 

f = G (g^g) " gT (^^^) G (g^g) " G^ 

In particular, if A £ Sp, the projection of A on 5(G) is HAH with the projection matrix H = 
G{G^Gy G'^ G RP^'P. 

It amounts to saying that inf {||^ — G^'G"'" || ; ^' G is reached at 

^= (G'G) G' ( 1 G(G'G 



2 

Remark 3.2. Thanks to the properties of the reflexive generalized inverse given in {RaolS'^ , the 
projection of a non-negative definite matrix A £ Sp on S (G) will he also a non-negative definite 
matrix. Moreover, the matrix 11 does not depend on the choice of the generalized inverse. 

Thanks to this result, the projection of S on 5 {Gm) can be characterized as 

= nmSIIm (7) 

and the same for S (that is, our candidate for estimating S) 

5^771 = ^mS^m (8) 

where = Gm {Gj^Gm) Gj^. 

Hence, the estimator is a covariance matrix. Now, our aim is to choose the best subset 
m among a collection of candidates. 
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4 Model selection with the U.R.E. method 



Let be a finite collection of models m. In this section, we focus on picking the best model 



among this collection by following the U.R.E. method. Since the law of 
we thus aim at finding an estimator of its expectation. 
We consider that the best subset m is mo defined by 



is unknown, 



rriQ £ argminE 



Then the oracle is defined as the best estimate knowing all the information, namely Smo- 
r - 2] 

Set R{m) = E S — . First, we compute this quantity. 

Proposition 4.1. 



E 



,2 I ^^((n. 



n„,) 



n 



(9) 



where ^ = Var {yec (xx^)) . 



Here we can note the similarity with the usual risk for standard estimation models. For 
instance, assume that we observe a Gaussian model with observations a vector Y S such as 

Y = e + ei e~AA(0,I„) 

where e E M and 9 G is the unknown quantity to estimate, using the projection 6^ of the 
vector Y onto some subspace Sm- If the subspace dimension is denoted by Dm-, the risk of a 
such estimator is given by 



E 



We thus recognize the same kind of decomposition with a bias term and with *^((^"'^^'")^) 
playing the role of the variance term Dm/n with e = 1/y/n. Hence it is natural to extend the 
Unbiased Risk Estimation procedure of previous Gaussian model to the matrix model obtained 
by the vectorization of Model ([1]). 

Now, we present an estimator of the risk. We assume n ^ 3, and we set : 



7^ 
im 



1 



n 



1=1 



Proposition 4.2. 



S-T.r 



2 -2 

+ 2^ + C is an unbiased estimator of the risk, where C does not 



depend on m. More precisely : 









2 ^2 1 

+ 2^ 
n 








2" 


E 






= E 









tr{<^) 



n 



Note that the constant 



is unknown but does not depend on m. So in the URE procedure 

2 -2'^ 

minimizmg I S — +2-^ with respect to m is equivalent to minimizing 5 — S 
which is unbiased. 

Then we can define the estimator S of S by 



2 -2 
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with 771 £ argmin 



2 ^2 

n 



The next theorem estabhshes an oracle inequahty for this estimator. 
Theorem 4.3. For all A> we have : 



E 



S - S 



^ (1 + ^"^) inf E 



+ 



n 



(4 + A) 



Hence we have obtained a model selection procedure which enables to recover the best 
covariance model among a given collection. This method works without strong assumptions 
on the process, in particular stationarity is not assumed, but at the expend of necessary i.i.d 
observations of the process at the same points. 

We point out that this study requires a large number of replications n with respect to the number 
of observation points p. Actually our method is not designed to tackle the problem of covariance 
estimation in the high dimensional case p » n. This topic has received a growing attention 
over the past years and we refer to [IBLQSJ and references therein for a survey. 

The proof of these results are using the vectorization of the matrices involved here. That is 
why we must deal with the matrix = var {yec (xx"'")) . It is postponed to the appendix. 



5 Numerical examples 

In this section we illustrate the behaviour of the covariance estimator E with programs imple- 
mented using SCILAB. We aim at knowing if our procedure leads to choose the best model, 
that is the model minimizing the risk. 

Recall that n is the number of copies of the process and p is the number of points where 
we observe these copies. Here, we consider the case where T = [0; 1] and A is a subset of N. 
For sake of simplicity, we identify m and the set 1, . . . ,m. Moreover, the points (ij)i<j<p are 
equi-spaced in [0; 1] . 

For a given process X, we must start by the choice of the functions of its expansion. Their 
knowledge is needed for the matrix Gm- Indeed, {Gm)j\ = 9\ (tj)- 

The method is the following: First, we simulate a sample for p and n given. Second, for m 
between 1 to some integer M, we compute the unbiased risk estimator related to the model m. 
Finally, we pick out a rh minimizing this estimator and we compute S. 

For each example, we plot the curve of the risk function and give its minimum rriQ. We plot 
also the curve of the function of the risk estimator and give its minimum rh. Finally we compare 
the true covariance and the estimator. 



Example 1 



Here we work with the numerical examples of [BBLMAlOj . We choose the Fourier basis func- 
tions: 

si A = 1 



1 

VP 



\/2-^ cos(27r^t) si A est pair 

sin(27r^^t) si A est impair 



6 



And we study the following process : 



A=l 

where ax are independent Gaussian variables with mean zero and variance V{ax). Let D(y) 
the diagonal matrix in m* x m* such as D{V)x\ = V{ax). Then we have 

S = G„,*D{V)Gl. 

Here are the results for V{ax) = 1 V A. We choose m* = 35 = p, n = 50, M = 31. Here 
it can be shown that the minimum of the risk is achieved at § — 1, so in this setting we have 
mo = 24. Here the minimum of the estimator is the same : m = 24. 



25 30 35 





Here are the results for V{ax) = 0.0475 + 0.95^ V A, and m* = 35 = p, n = 60, M = 34. 
Here the figures show that mo = rfi = 18. 



Example 2 



Now we test our estimator with the process studied in [CYlOj . 
We consider the functions 



And the process X studied is 



gx{t) = cos(A7ri) 



X{t) = ^axCx9xit) 

A=l 



where ax are i.i.d. random variables following the uniform law on [— \/3; V3\ and (^x = ^ 
If D is the diagonal matrix with entries Dxx = ^ , as before we have that 



A+l 



S — Gm* DG^* 
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Here we choose m* = 50, n = 1000, p = 40 and M = 20. We found niQ = A = rh. 



5 ID 15 20 25 30 35 



10 15 20 25 30 35 40 45 50 





Example 3 



Here we consider the case of the brownian bridge with its Karhunen Loeve expansion. Indeed, 
this expansion 



X{t) = Y,Zx^x9x{t) 



X>1 



IS 



computed in |SW86j . p. 213-215 : ux = (^)^ and gx{t) = \/2 sin(A7rt). 



The covariance function of the brownian bridge is K{s,t) = s(l — t) for s ^t. Simulate the 
sample is the same as simulate n gaussian vectors of covariance matrix S = {K{ti,tj))i^i_js^p. 
Here n = 100, p = 35 and M = 20. We found niQ = 5 = rh. 
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Note that for the first and the last example,the size of the sample is not so large. However, 
for each of the simulated examples, the covariance estimator shows good performances. Indeed, 
the procedure introduced in this paper leads each time to the selection of the best model, in the 
sense that the chosen model minimizes the risk. 



6 Appendix 



Recall that = Hm Slim , = 11^ SUm and 



n 



1 " II 

1=1 



We start by proving the proposition 14. 1[ 



Proof. Using the orthogonality, we have 

2 

With the proposition 12.11 we deduce 

2 



|5j — Smil + 



vec(T.jn-^m) = n„ (g) n„ wee (S - 5) 



Since Hm is a projection matrix, 

. 2 



tr ( (Jim (X" Ilm) vec (S — 5) vec (S — 5) 



Hence 



E 



|S-S^ir + E 



tr (H^ (g) Hm) t;ec (S - S) vec (S - 5) 
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S - S^f + tr ( (n^ <^ n^) E vec (S - S) vec (S - S) 



tr ( {Hjn tX" Hm) E vec (S — xx~^) vec (S — xx~^) 

lis - s„f + ^ L 



n 



□ 



Proof of Proposition 14.21 

Proof. We start by the proof of the fohowing lemma 
Lemma 6.1. 7^ is an unbiased estimator of tr {{Hm ^ ^m) 

Proof. We deduce from the proposition 12.11 and the fact that 11^ is a projection matrix that: 

2" 



vec ( JlmXiX^ Urn I " i^ec ( 



(n-l)E[7^] = J^E 

i=l 

n r 

= ^E \^{Iim®^m){vec(^Xix]^ -vec{S)^ 

i=l L 

" r / 

^ E tr ( (Ilm (g) Ilm) i^ec (^XixJ^ - vec (5)^ {vec {xixj^ - vec {S)^ (Em ® U 
i=i L ^ 

" / r 

tr ( (Ilm Hm) E ^wec ^XjX^^ — vec {S)^ {vec {xixj^ — vec {S)^ 



i=l 



But if (?^i)i<j<„, are some i.i.d. vectors with covariance matrix V and mean v = - ^^Li ^i) 
we have 



E 



{Vi - v) (Vi - v) 



1 

j,k=l 

1 " 

-2 ^ E - Vk) {vi - Vj) 



j,k=i 



^ {(n - 1) E [(t-i - V2) {vi - V2)'^] + (n - 2) (n - 1) E [{vi - V2) (vi - vsf] } 



{(n - 1) 2y + (n - 2) (n - 1) 



n 



^((n-l)ny) 



Hence 



this identity gives 



E 



(vi - v) {vi -vY =- ((n - 1) V) 



n 



Finally 



(n-l)E[7^] = J^trf (n™C5n^)-((n-l)$) 



E [7^] = ((n„ ® n„) 



□ 
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Now, it remains to show that 

2" 



E 



We have that 



S-K 



S — Yjr, 



|5j — iimSiim 



2 _ tr ((n^ n^) ^) ^ tr(^) 



n 



n 



5-Sr + 2(5-S,S-S„) + 



And using the orthogonahty we deduce that 

2 



S - Sf + 2 (5 - S,S - S^) + ||S - S^f + 



5- S 
For the same reason : 

s* — s, s — Sm") = (s* — s, s — s-m) + ('5' — S, — s, 

. 2 

And because the expectation of S is equal to S we obtain that 



E 



is-s^ir +E 



15- SI 



E 



First 



E 



IS- SI 



E/ . T _ . T _ 



n 









2" 






- S 





And with the properties of the Frobenius norm 

2" 



E 



xx^ — S 



E 



vec ( xx""" — S 



E 



then we derive that 



Thus 



veci^x^^ — t;ec(S)^ (^eci^xx^^ — fec(S)^ 

tr ($) 



E 







2" 




xx^ — S 





E 



15-Sl 



n 



Second 



E 



n 



Sm SjT^ 



4^ 



Ilm (xx"^ - S I Ilm,!!™^ (xx"^ - S ) 11^ 









)n„) 


= iE 






n 





( xx"^ - SI n, 



And using the proposition 12.11 and the specificity of 11^, we obtain that 



E 



Ilm ( Xx"^ - S I Ur, 





2' 






)n^ 




= E 





vec ( Urn I Xx'^ — S | 11, 
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E 



Hm (8) Hm I vec ( XX — T, 



E 



tr ( ® ( vec ( xx'^ — S ) ( wee ( xx'^ ~ ^ ) ) <X' 11^) 



E 



tr I (Ilm <S5 Hm) I wee I —Sill wee (xx — S 



Hence 



And we obtain 



E 



n„,, xx' -s n, 



E 





2" 


)nm 





Finally, we have 



E 



S — Sr 



tr((n„on„)$) 
tr ((n„ n„) 



n 



+ 



n 



n 



Proof of Proposition 14.31 

- 2 

Proof. As — ^ 0, we have: 









2' 








2 ^2 ■ 

n 




E 




S - S 




^ E 






+ 2E 



Let mo S argminE 



S - 5, 5 - S 
an oracle. By definition of m, 



+ E 



\s-n 



S — Yjr, 



2 ^2 
+ 2^ ^ 

n 



5- S, 



mo 



2 ^2 

n 



Then 



E 



S - S 



^ E 



5- S, 



mo 



2 ^2 



■E 



15- SI 



+ 2E 



S - 5, 5 



we derive from the previous proposition and (|T0 
E 







2' 








2" 




s - s 




^ E 




S — 5]mo 





+ 2—^ + 2E 

n 



S - 5, 5 - S 



Moreover by the Cauchy-Schwarz inequality we have that 



S-5,5-S) ^ 



S - 5 



And using again this inequality 



E 



S - 5, 5 - S 



^ WE 



S-5 



E 



< WE 





±-s 


2 ^2 - 
+ 2^ 









tr ($) 



n 
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For the same reasons as before we obtain 



E 



S - 5, 5 - S 



^ 4 E 



S - E 



mo 



+ 



ir($) /ir($) 



n V n 







2' 













Itr ($) 



Thus 



E 







2" 








2 




s - s 




^ E 























,tr I 



n 









2' 




r 











With the fohowing inequality which holds Va, 6 € M et V^l > 

„2 



2a6 < ^ + 
A 



We obtain for all A > 0: 

E 

The definition of mo gives the result. 







2" 








2" 




s - s 




^ E 









'tr ($) 



n 



(l + A-) + ^(4 + A) 



□ 
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